Skip to content

Add generation benchmark spec replay support#43

Draft
mikeylong wants to merge 3 commits intomainfrom
codex/zero-shot-benchmark-replay-qwen
Draft

Add generation benchmark spec replay support#43
mikeylong wants to merge 3 commits intomainfrom
codex/zero-shot-benchmark-replay-qwen

Conversation

@mikeylong
Copy link
Copy Markdown
Collaborator

Summary

  • split benchmark artifacts into reusable spec and per-run manifests
  • add replay support, baseline-primary guidance, and model attribution to generation benchmarks
  • extend benchmark reporting with platform, consumer, and model metadata

Testing

  • node --test test/generation-session.test.mjs test/generation-benchmark.test.mjs

Notes

  • Structural coverage is green locally.
  • A live local-LLM qwen3.5-9b replay cohort is still pending as follow-up evidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant